Reading dies in complexity: Online news consumers prefer simple writing

Over 30,000 field experiments with The Washington Post and Upworthy showed that readers prefer simpler headlines (e.g., more common words and more readable writing) over more complex ones. A follow-up mechanism experiment showed that readers from the general public paid more attention to, and processed more deeply, the simpler headlines compared to the complex headlines. That is, a signal detection study suggested readers were guided by a simpler-writing heuristic, such that they skipped over relatively complex headlines to focus their attention on the simpler headlines. Notably, a sample of professional writers, including journalists, did not show this pattern, suggesting that those writing the news may read it differently from those consuming it. Simplifying writing can help news outlets compete in the competitive online attention economy, and simple language can make news more approachable to online readers.

low scores).Note that the output for the Meaning Extraction Method is a binary matrix representing the presence or absence of a particular term across each headline.For example, if the word Putin appeared in Headline A, it would receive a score of 1.If the word Putin did not appear in Headline B, it would receive a score of 0. This binary output (0 or 1) follows best practices for the Meaning Extraction Method.For words to be retained in this analysis, they must have appeared in at least 1% of the headlines in each dataset.
The relationship between simplicity and CTR was statistically significant after accounting for each theme as a fixed effect in the prior linear mixed model calculation (B = 0.008, SE = 0.001, t = 8.81, p < .001).Headlines that related more to the White House (p < .001),less to climate change (p = .013),and less to COVID-19 (p = .047)tended to receive a higher click-through-rate.Together, this evidence suggests our simplicity effects are robust to content and other covariates using a legacy and traditional journalistic outlet.

Study Set 2: Additional Information Analytic Plan
We first used a linear mixed model to associate the simplicity index with CPI, followed by separate models for each language dimension of interest.Each model contained a random intercept for A/B test since headlines within each test were not independent.

Results
We provide a visual description of the minimum and maximum analyses reported in the main text for the simplicity index (see fig.S1, right panel).The bivariate relationship between simplicity index items and clicks-per-impression based on the minimum and maximum analyses were as follows: for common words (r = .152,p < .001),analytic writing (r = -.037,p < .001),readability (r = .023,p < .001),and character count (r = .089,p < .001).We also provide tables of the linear mixed model results for transparency (table S4).

Alternative Explanations
Consistent with Study set 1, we deviated from our preregistered analysis plan to evaluate the degree to which our simplicity effects were robust to content.We extracted five dominant themes from the headlines using the Meaning Extraction Method approach described earlier.
In a linear mixed model controlling for such themes as fixed effects, the evidence suggested simplicity remained positively associated with CPI (B = 0.002, SE = 0.001, t = 3.16, p = .002).Themes related to race/ethnicity, question-asking, gender, and watching videos were positively associated with CPI (p's < .001),and the theme of societal problems was negatively associated with CPI (p = .003).Again, with a new study using a vastly different and nontraditional journalistic outlet, the evidence suggests linguistic simplicity is associated with engagement above and beyond content effects (see table S8).

Commentary on Statistical Re-Expression
Readers will notice several analyses in this paper used variables that were re-expressed (natural log-transformed).This was purposeful, and followed best practices upon the identification that certain variables were indeed skewed.
The specific re-expression formulae were dependent on the statistical test under consideration.For example, Study sets 1 and 2 had two main analyses: (1) correlational, and (2) those involving the linear mixed models.The correlational analyses added a constant to each variable (value = 1), which when presented in a scatterplot, suggested this re-expression represented the data best and retained the greatest number of A/B tests in the analyses.Other reexpressions, including ln(X + .1)or ln(X + .01),produced conceptually equivalent results (and larger effect sizes), but substantially reduced the number of tests under consideration due to the presence of impossible values (dividing by zero).We therefore decided to use the formula with the constant equal to 1 for transparency and generalizability.
In the linear mixed models, re-expression of the dependent variables was based on the authors' interpretation of Q-Q plots and familiarity with similar data structure.We offer this commentary in the spirit of transparency.

Procedure
If participants consented to participate, they were randomly assigned via Qualtrics software to one of two experimental conditions that presented either a simple (n = 258) or complex (n = 266) set of 10 news headlines.Six of these headlines were directly taken from The Washington Post (control headlines) and contained higher than average complexity.The other four headlines (target headlines) were modified versions of original Washington Post headlines.To make these sets, the authors first selected two Washington Post headlines that were in the top 1% of the simple headlines provided, and then did the same with two headlines that were in the bottom 3% percent.With these original headlines as templates, a thesaurus was used to replace original words with their more complex (or simple) counterparts.This approach has been taken in other research using a language complexity manipulation (22,50).When these headline pairings were created, care was taken to keep headline word counts as consistent as possible (within two words of one another, see table S5).This approach allowed us to vary the complexity of headline language without modifying the substance, or form, of the original headline.
During this headline task, participants were provided with the following prompt: "On the next page you will view 10 news headlines.Imagine that you were browsing the home screen of a newspaper on your computer or reading a newspaper at home.We are interested in knowing which headline you would be likely to click on.When you are ready to proceed to the next page, please click the advance button below."Importantly, participants were not informed beforehand that they were going to be asked about these headlines again.
After participants selected the headline they would be likely to read, they went on to answer a series of filler items.These items asked general questions about news reading and interest in the news.The purpose of these questions was to provide a distractor task in between the headline selection task and the signal detection task described below.After the signal detection task, participants provided their demographic information.This demographic information is not provided in the data available on the OSF site in order to protect participant identity.This information can be made available, however, upon request.In total, this survey took 9.30 minutes to complete (SD = 11.63 minutes, Median = 6.70 minutes).

Signal Detection Task
The purpose of this section is to provide more detail regarding the signal detection task.First, across both the simple and complex language conditions participants filled out the same 24 item set.Of these 24 items, 12 were phrases that were coded as either a hit or a foil depending on condition assignment (e.g., "should work out" vs. "prudent to exercise").Then six items were hits common to both conditions, and six items were foils common to both conditions.The instructions preceding this task stated "We are now interested in what you remember from the headlines you read.In the following section, you will see 2-3 word phrases that may or may not have appeared in the headlines you read earlier in the study.After reading each phrase, please indicate either: "Yes" -meaning you saw this phrase in a headline, or "No"meaning you did not see this phrase in a headline."To better ensure attention across the entirety of these headlines, care was taken to ensure that different parts of each headline (beginning, middle, end phrases) were equally represented.Some examples of these three-word phrases include: "causes union talks", "make cocaine legal", "laborious endeavors", "in the sky", and "has new idea".Notably, phrase placement within a headline did not impact signal detection performance.

Outcomes
The first outcome was referred to in the main article as headline selection.This outcome reflects the article chosen, or clicked on, by participants.If a participant chose one of our manipulated, or "target," headlines this response was coded as 1, and if participants selected one of our control headlines this response was coded as 0.
To measure recognition memory using a signal detection task, a "sensitivity score" (12) was calculated by measuring the distance (in standard deviations) between the hit and foil distributions, a measure known as d' (d-prime).Thus, sensitivity (d') can be conceptually interpreted as participants' ability to discern hits (i.e., signals) from foils (i.e., noise), in which higher scores reflect better sensitivity.We opted for this behavioral measure of attention because it is less prone to demand characteristics than self-report measures of attention (see General Discussion in the main text for more information and commentary on this matter).

Robustness Check
Similar to the post-hoc analyses run for Study sets 1 and 2, for Study 3 we ran an exploratory post-hoc analysis to assess whether the relationship between condition assignment and signal detection task performance was maintained when accounting for crowd workers' news reading habits and level of education.Specifically, a regression was run using condition assignment, news interest, news reading frequency, and level of education as predictors, with d' scores as the dependent variable.It was found that the relationship between condition assignment and signal detection task performance remained significant, and in the expected direction, B = -0.45,SE = 0.08, t = -6.03,p < .001,even when controlling for news habits and level of education.Thus, language simplicity facilitates attention above and beyond what would be expected by daily news reading habits or education.

Limitations
The goal of this research was to understand the reading habits of general news readers.To obtain this information, we relied on crowd workers who were compensated for their participation.Although using crowdsourced workers has become commonplace in the social sciences, there are known issues regarding the representativeness of this sample and potentially concerns about data quality (51).Thus, we want to acknowledge these limitations.

Participants
Because the occupational demographics of this sample were of interest to Study 4, we provide a table of the professional characteristics of this sample.Despite the availability of some professional information, given that this study was voluntary and did not provide any monetary incentives, it is not surprising that a lot of these data were missing.Nevertheless, we report the data we have in table S6.We also want to note that some of these data are excluded from the datafile available on OSF to maintain participants' anonymity.If anyone is interested in more details about this information they are welcome to reach out to the authors of this study.

Procedure
This survey experiment was almost identical to the survey experiment described in Study 3 including all task instructions.After providing consent, participants were presented with the same headline selection task described in Study 3.There were two small differences between these survey experiments.The first was the filler task.While in Study 3 the filler items inquired about general news reading, in this study questions were asked about participants' professional experiences given the purpose of this study.The second difference was, following the same signal detection task as Study 3, these participants were provided with a short, six item test.The instructions for this test read, "The Washington Post has conducted lots of A/B tests to find the best headlines for stories.In the next two pages, we provide some of those A/B tests.For each pair of headlines, please select the one that you think had a higher click-rate?" Participants then viewed six pairs of original headlines from The Washington Post to see whether journalists could intuit which headlines were successful.Performance on this test was coded as a 0 for an incorrect answer, and 1 for a correct answer, and scores were summed to create the accuracy scale reported in the paper (range 0 -6).In total, this task took approximately 11.68 minutes to complete (SD = 23.65 minutes, Median = 6.73 minutes).

Outcomes
The same data analysis as in Study 3 were also used for Study 4. The primary outcomes were headline selection (targets versus controls) and the sensitivity score on the SDT task.

Supplemental Figures and Tables
fig.S1.Bivariate relationship between simplicity and CTR in The Washington Post (left panel) and CPI in the Upworthy sample (right panel).

table S2 . Descriptive statistics across simplicity variables in Study sets 1 and 2.
For the simplicity index creation, the four variables underlying the simplicity index were standardized.N = 19,926 for Study set 1 and N = 105,551 for Study set 2.

table S3 . Correlations between simplicity variables in Study sets 1 and 2.
Correlations between all variables that comprise the simplicity index.All correlations are Pearson correlations and based on raw values.*** p < .001

table S5 .
Experimental headline selection task.An asterisk denotes original headline language from The Washington Post.The control headlines were all original headlines.

table S6 . Characteristics of sample recruited from webinar.
For these items, participants were allowed to select multiple options.

table S7 . Results from the principal component analysis and meaning extraction method for The Washington Post.
Components were rotated using the varimax method.λ = eigenvalues.% = amount of variance explained by each component.For this analysis, unigrams (single words), bigrams (two-word phrases), and trigrams (three-word phrases) were all extracted as possible terms for meaning extraction.

table S8 . Results from the principal component analysis and meaning extraction method for Upworthy.
Components were rotated using the varimax method.λ = eigenvalues.% = amount of variance explained by each component.For this analysis, unigrams (single words), bigrams (two-word phrases), and trigrams (three-word phrases) were all extracted as possible terms for meaning extraction.